Computer system with improved data capture system

ABSTRACT

A computer system has a plurality of sub-systems each comprising a serial interface and a buffer device coupled with the serial interface for buffering crash data sent by the serial interface having an external serial output. The system further comprises a management controller coupled with the external serial output of the buffer device to retrieve data buffered during a crash.

FIELD OF THE INVENTION

[0001] The present invention relates to a computer system, including aplurality of independent sub-systems, a control sub-system and animproved mechanism for handling a crash on a sub-system.

BACKGROUND OF THE INVENTION

[0002] Today's computer systems, in particular server systems, compriseoften a plurality of sub-systems. Each sub-system can be an independentcomputer system running its own operating system. For example, asub-system can comprise a multiple processor architecture running aWINDOWS® operating system. These sub-systems can thus be fullyoperational computer systems, for example, personal computers or serverswhich could be coupled with a keyboard, mouse, monitor, etc. A pluralityof those sub-systems can be linked and coordinated through a specificdedicated management bus system which is coupled to an embedded servermanagement controller. To this end, each sub-system comprises a socalled bridge to couple with the dedicated bus system. As such adedicated bus system does not need to transfer a lot of data and nocritical high speed transfer is required, such a management controlleruses for cost reasons often only a standard two wire serial interface. Arespective interface is also provided within a bridge of eachsub-system. In addition, the respective operating system might haverestrictions with respect to certain communication paths in particularin system crash situations. For example, the above mentioned WINDOWSoperating system uses one of its serial communication ports to dumpcrash data.

[0003] Whenever one of the sub-systems operating system crashes, itdumps a plurality of data, such as data indicating the circumstances ofthe crash, through the serial interface. However, in a system withmultiple sub-systems the management controller is responsible to providein-band and out-of-band server management for all installed sub-systems.To this end, the management controller only enables one serial interfaceon one sub-system at a time and switches between different sub-systemson regular intervals. Hence, the management controller provides amultiplexed console redirection to the remote sub-systems. Even so thisarchitecture is satisfactory during normal operation, whenever onesystem fails it cannot always be guaranteed that the managementcontroller receives all necessary data to be able to identify therespective details of a sub-systems failure.

SUMMARY OF THE INVENTION

[0004] Therefore, a need for an improved multiple sub-system serverarchitecture which overcomes the above mentioned problems exists.

[0005] A first embodiment comprises a computer system having a pluralityof sub-systems each comprising a serial interface and a buffer devicecoupled with the serial interface for buffering crash data sent by theserial interface having an external serial output. The system furthercomprises a management controller coupled with the external serialoutput of the buffer device to retrieve data buffered during a crash.

[0006] A method of operating a computer system comprises a plurality ofsub-systems each running independently an operating system and amanagement controller coupled with the plurality of sub-systems, themethod comprising the steps of:

[0007] upon a system crash outputting a crash dump file through a serialport of the sub-system;

[0008] buffering the crash dump file;

[0009] generating a control signal for a management controller;

[0010] upon request by the management controller coupling the managementcontroller with the sub-system; and

[0011] transferring the buffered crash dump file to the managementcontroller.

[0012] Yet another embodiment of a computer system comprises a pluralityof independent sub-systems each running a operating system that outputsa crash dump through a serial port and generates a control signal upon asystem crash, a management controller having a control input, a serialbus interface coupled with a communication line, and a serial input.Each sub-system comprises a microcontroller having a control input, amemory, and a serial input port coupled with the serial port and aserial output port, a controller unit having a serial bus interface forcoupling with the management controller and an input for receiving thecontrol signal and generating an external control signal fed to thecontrol input of the management controller and an output for an internalcontrol signal fed to the microcontroller, and a switch controlled bythe controller unit for coupling the serial output port with theexternal communication line.

[0013] Other technical advantages of the present disclosure will bereadily apparent to one skilled in the art from the following figures,descriptions, and claims. Various embodiments of the present applicationobtain only a subset of the advantages set forth. No one advantage iscritical to the embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

[0014] A more complete understanding of the present disclosure andadvantages thereof may be acquired by referring to the followingdescription taken in conjunction with the accompanying drawings, inwhich like reference numbers indicate like features, and wherein:

[0015]FIG. 1 is a block diagram of an exemplary embodiment according tothe present invention;

[0016]FIG. 2 is a block diagram of another exemplary embodimentaccording to the present invention;

[0017]FIG. 3 is a block diagram showing parts of a single sub-system inmore detail;

[0018]FIG. 4 is another embodiment of a single sub-system; and

[0019]FIG. 5 is a flow chart showing a method to retrieve crash dumpdata according to one of the embodiments of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0020] Turning to the drawings, exemplary embodiments of the presentapplication will now be described. FIG. 1 shows a block diagram of acomputer server system 100. Such a system comprises a plurality ofsub-systems 110, 120, . . . 130. Each sub-system 110, 120, . . . 130 isan independent computer system, such as a personal computer or a singleserver. Usually only the motherboards of these computers or servers areused and placed into a rack or tower system. Every sub-system 110, 120,. . . 130 comprises the respective components 111, such as a centralprocessing unit, memory, peripheral interfaces, etc. Usually only onekeyboard, mouse and monitor (not shown) are coupled with akeyboard-mouse-monitor managing unit (KVM, not shown) which selectivelycouples the KVM with one of the sub-systems.

[0021] In FIG. 1 a serial interface is indicated by numeral 112. Thisserial interface 112 is one of the peripherals which are on themotherboard of a computer system and is usually a standard RS232compatible serial interface. In addition, according to the presentinvention, a serial buffer 113 is provided. The serial buffer 113 isinternally coupled to the serial interface and buffers all outgoingdata. The serial buffer of each sub-system 110, 120, . . . 130 iscoupled through a bus system 160, such as a SPI-bus, I²C-bus,Micro-wire, universal system bus (USB) or any other suitable serial bussystem with a management controller 150. In addition each system cangenerate a non-maskable interrupt which is further directed to theserver management controller through interrupt lines 114, 121, . . .131.

[0022] In operation, the management controller 150 is activated throughone of the interrupt lines 114, 121, . . . 131. Whenever the operatingsystems in one of the sub-systems 110, 120, . . . 130 crashes it dumpsits crash data through the serial interface. This dump is usuallyrelatively uncontrolled as the operating system in this moment due tothe nature of a crash does not operate very reliable anymore. Eachoperating system has its own procedure of sending such a “last call”before completely shutting down. Many operating systems are using themonitor to indicate to a user what happened and in addition sending adetailed dump through the serial interface. However, the monitor dump orsignal is not very useful as the respective sub-system might not beconnected to the KVM at the time of a crash. According to the presentinvention, a serial buffer is installed on each sub-system 110, 120, . .. 130. Which receives and stores the crash dump data. Through theinterrupt lines 114, 121, . . . 131 the respective sub-system thatcrashed indicates to the management controller 150 that its operatingsystem crashed. After this signal has been sent, the managementcontroller 150 retrieves the respective crash data from the respectivesub-system through bus 160. Thus, no data will be lost and the systemcrash within one of the sub-systems 110, 120, . . . 130 can be fullyevaluated. In another embodiment, the buffer 113 can comprise only astandard serial interface. In this case, the management controller 150can comprise a plurality of serial interfaces to connect to eachsub-system 110, 120, . . . 130 or a single serial interface and acontrollable switch which selectively couples the controller 150 withone of the serial interfaces of the sub-systems 110, 120, . . . 130.

[0023]FIG. 2 shows another embodiment of a server system 200 accordingto the present invention. The server system 200 comprises a plurality ofsub-systems 210, 220, . . . 230. Each sub-system 210, 220, . . . 230includes amongst the usual system components a system I/O 212, forexample a serial communication port such as a COM port in a personalcomputer system for generating serial transmission signals and serialport control signals. Furthermore, a bridge 211 is implemented forgenerating other control signals, such as interrupt signals. A specialinterface device is a communication controller 216 which receives thesignals from bridge 211 and from the system I/O and generates aplurality of internal and external control signals. It is also used tocommunicate with the external management controller 150 through a serialbus system 250 and through interrupt lines 217, 227, . . . 237. Amicrocontroller 214 comprises a serial port which is coupled with thesystem I/O serial port 212. The microcontroller 214 comprises its ownmemory and peripherals. A switch 215 is coupled with another serial portof microcontroller 214. The switch 215 is furthermore coupled with aserial input of management controller 150 through serial coupling 240.

[0024] In this embodiment a standard motherboard including somemodifications is used. The additional microcontroller 214 on eachsub-system 210, 220, . . . 230 is used to buffer any type of crash dumpso it won't be lost and can be retrieved at a later time. Such amicrocontroller does not have to be a high performance microcontrollerand can comprise for example one or two serial standard ports (RS232)and sufficient dynamic or static memory to buffer the outgoing crashdump. If only one serial port is implemented, the receiving line (RX,input) will be coupled with the COM port of the sub-system 210, 220, . .. 230 and the transmitting line (TX, output) will be connected with theswitch 215. Once a system crash occurs in one of the sub-systems 210,220, . . . 230, the respective bridge 211 asserts a non maskableinterrupt and the ready to send signal RTS is generated by the systemI/O. The communication controller 216 then signals to themicrocontroller 214 that data will be sent to the microcontroller 214.The microcontroller then transfers the incoming data to its memory forlater retrieval. While this is happening, the communication controllerasserts an external interrupt which is fed to the management controller150. The management controller 150 then starts the retrieving procedure.It first sends a command through the serial bus addressing thesub-system which sent the interrupt and prepares itself for reception ofthe crash dump file. The communication controller 216 sends a respectivecommand to the microcontroller 214 to initiate a data transfer throughits second serial interface. In addition, communication controller 216activates the switch 215. Thus, only one transmitter is coupled to theserial connecting line 240 and no data collision can occur. Next, themicrocontroller 214 sends the crash dump previously stored in its memoryto the management controller 150 which will further analyze this data.After completion of the transfer, the communication controller 216controls the switch 215 to decouple the serial port from the connectingline 240.

[0025]FIG. 3 shows the relevant parts of a sub-system 210, 220, . . .230 in more detail. A motherboard 300 comprises a serial port 302forming the system I/O, such as the COM2 port of a personal computer orserver. Furthermore, a control interface 301 forming the bridge isimplemented to provide other control signals, such as an interruptsignal. The communication controller 330 is coupled with the serialinterface control signals and furthermore can receive signals from avoltage detector 310 and a temperature sensor 320. The communicationcontroller 330 provides a serial bus interface 380, such as a SPI-bus,an I2C bus, or any other suitable serial bus interface. Furthermore, aninterrupt signal 370 is generated by the communication controller 330.The microcontroller comprises amongst others a central processing unit340 (CPU) which is coupled with the serial port 302 through a firstserial interface 345. Furthermore, the microcontroller has a memory 350which is coupled with its CPU 340. A second serial interface 360 of themicrocontroller is coupled with the CPU and a switch 365 which couplesthe serial interface 360 with the external serial communication line390. The switch is controlled by the sub-system communication controller330.

[0026] The management controller 150 is primarily responsible todetermine the environmental status of the server system. To this end,for example, each sub-system comprises respective voltage sensors 310and temperature sensors 320. The sub-system communication controller 330is providing the management controller 150 constantly with informationabout the supply voltage and the temperature of each individual system.According to the present invention, the management controller is alsoresponsible for documenting and analyzing a system crash of one of thesub-systems. As each sub-system is dumping the respective crash dumpfile through its serial port upon a crash, the CPU 340 of themicrocontroller is buffering this file in its memory 350 andtransferring it upon request to the management controller 150 asdescribed above.

[0027]FIG. 4 shows another possible embodiment. Only the relevant partswhich are different from FIG. 2 and 3 are shown in FIG. 4. Here, the CPU400 is coupled again through a standard RS232 serial interface 440 withthe sub-system (not shown). Again, the CPU 400 is coupled with a memory450. In contrast to FIG. 3, this embodiment is using a serial businterface, such as a universal interface bus 460 (USB) or an I²C buswhich couples the CPU 400 with an external USB-bus 470. Thus, no switchis needed in this embodiment as all serial interfaces can be coupledwith the serial bus 470. The respective arbitration protocol of theserial bus system will prevent any type of collision which renders thedata unusable.

[0028] In case of a system crash of one of the sub-systems, the CPU 400will again buffer all crash related data within its memory 450. Once theCPU has stored all crash data in a respective file it can transmit acommand through the serial bus interface 460 to the managementcontroller 150 indicating that a crash occurred. This message caninclude identification data about the respective sub-system. As soon asthe management controller is ready to receive the crash dump it canindicate this to the respective sub-system by sending a command throughthe serial bus interface 460. Upon reception of this command, the CPU400 transfers the crash dump file from its memory 450 to the externalmanagement controller. In this embodiment, the sub-system controller canbe minimized. There is no need to provide a serial bus interface in thiscontroller as the serial bus interface 460 of the microcontroller can beused. In addition, no external interrupt signal has to be generated. Theinternal interrupt can be directly fed to the CPU 400.

[0029]FIG. 5 shows a flow chart of the procedure with an embodimentaccording to FIGS. 2 and 3. In step 500 the status of the non-maskableinterrupt or the flow control signals is monitored. If such a signal isasserted the procedure proceeds to step 510 where the buffer system isenabled. In step 520 the external interrupt will be asserted signalingto the management system that a system crash in one of the systemsoccurred. In step 530 the management controller reads the sub-systemcommunication controller's information, thus discovering that theparticular system crashed. Finally in step 540, the management systemretrieves the crash data from the microcontroller's memory.

[0030] The present invention has the particular advantage, that astandard motherboard can be used without modification. In such anapplication an additional microcontroller can be provided as anextension on a specifically designed card within the housing of theserver for each sub-system separately or combined depending on whethersub-system specific sensor are used or not.

[0031] The invention, therefore, is well adapted to carry out theobjects and attain the ends and advantages mentioned, as well as othersinherent therein. While the invention has been depicted, described, andis defined by reference to exemplary embodiments of the invention, suchreferences do not imply a limitation on the invention, and no suchlimitation is to be inferred. The invention is capable of considerablemodification, alternation, and equivalents in form and function, as willoccur to those ordinarily skilled in the pertinent arts and having thebenefit of this disclosure. The depicted and described embodiments ofthe invention are exemplary only, and are not exhaustive of the scope ofthe invention. Consequently, the invention is intended to be limitedonly by the spirit and scope of the appended claims, giving fullcognizance to equivalents in all respects.

What is claimed is:
 1. Computer system comprising: a plurality ofsub-systems each comprising: a serial interface; and a buffer devicecoupled with the serial interface for buffering crash data sent by theserial interface having an external serial output; a managementcontroller coupled with the external serial output of the buffer deviceto retrieve data buffered during a crash.
 2. Computer system accordingto claim 1, wherein each sub-system further comprises: a microcontrollerhaving a memory and a serial input coupled with the serial interface anda serial output; a communication controller; a switch coupled with theserial output, wherein the switch is controlled by the communicationcontroller.
 3. Computer system according to claim 1, wherein the serialoutput is part of a RS232 serial interface.
 4. Computer system accordingto claim 1, wherein the serial output is part of a universal bus serialinterface.
 5. Computer system according to claim 2, wherein thecommunication controller is coupled with the management controllerthrough a serial bus.
 6. Computer system according to claim 1, whereinthe sub-system generates an interrupt signal fed to the managementcontroller.
 7. Computer system according to claim 2, wherein thesub-system generates an interrupt signal fed to the communicationcontroller which generates an interrupt signal fed to the managementcontroller and a control signal fed to the microcontroller.
 8. Method ofoperating a computer system comprising a plurality of sub-systems eachrunning independently an operating system and a management controllercoupled with the plurality of sub-systems, the method comprising thesteps of: upon a system crash outputting a crash dump file through aserial port of the sub-system; buffering the crash dump file; generatinga control signal for a management controller; upon request by themanagement controller coupling the management controller with thesub-system; and transferring the buffered crash dump file to themanagement controller.
 9. Method according to claim 8, wherein the stepof generating a control signal includes generating an interrupt signalfed to the management controller.
 10. Method according to claim 8,wherein the step of generating a control signal includes sending acommand to the management controller through a serial bus.
 11. Methodaccording to claim 8, wherein of coupling the management controller withthe sub-system includes the step of coupling a serial output of thesub-system with the a serial communication line coupled with themanagement controller through a switch.
 12. Method according to claim 8,wherein of coupling the management controller with the sub-systemincludes the step of coupling the management controller and thesub-system through a serial bus and sending a command through a serialbus to request transmission of the crash dump file.
 13. Computer systemcomprising: a plurality of independent sub-systems each running aoperating system that outputs a crash dump through a serial port andgenerates a control signal upon a system crash; a management controllerhaving a control input, a serial bus interface coupled with acommunication line, and a serial input; wherein each sub-systemcomprises: a microcontroller having a control input, a memory, and aserial input port coupled with the serial port and a serial output port;a controller unit having a serial bus interface for coupling with themanagement controller and an input for receiving the control signal andgenerating an external control signal fed to the control input of themanagement controller and an output for an internal control signal fedto the microcontroller; a switch controlled by the controller unit forcoupling the serial output port with the external communication line.14. Computer system according to claim 12, wherein the control signaland the external control signal are interrupt signals.
 15. Computersystem according to claim 12, wherein the serial input and output portsare part of a RS232 serial interface.